Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Assessing student responses is a critical task in adaptive educational systems. More specifically, automatically evaluating students' self-explanations contributes to understanding their knowledge state which is needed for personalized instruction, the crux of adaptive educational systems. To facilitate the development of Artificial Intelligence (AI) and Machine Learning models for automated assessment of learners' self-explanations, annotated datasets are essential. In response to this need, we developed the SelfCode2.0 corpus, which consists of 3,019 pairs of student and expert explanations of Java code snippets, each annotated with semantic similarity, correctness, and completeness scores provided by experts. Alongside the dataset, we also provide performance results obtained with several baseline models based on TF-IDF and Sentence-BERT vectorial representations. This work aims to enhance the effectiveness of automated assessment tools in programming education and contribute to a better understanding and supporting student learning of programming.more » « lessFree, publicly-accessible full text available May 14, 2026
-
Code completion problems are an effective type of formative assessment; especially, when used to practice newly learned concepts or topics. While there is a growing body of research in computing education on the use of large language models (LLMs) to support learning content development, the use of LLMs for producing high-quality code completion problems has not yet been explored. In this paper, we analyze the capability of LLMs to generate effective distractors (i.e., plausible but incorrect options) and explanations for completion problems. We utilize common student misconceptions to improve the quality of the generated distractors. Our study suggests that LLMs are capable of generating reasonable distractors and explanations. At the same time, we identify a lack of a sufficiently granular taxonomy of common student misconceptions that would be needed for aligning the generated distractors with the common misconceptions and errors -- a gap that should be addressed in future work.more » « lessFree, publicly-accessible full text available May 14, 2026
-
As large language models (LLMs) show great promise in generating a wide spectrum of educational materials, robust yet cost-effective assessment of the quality and effectiveness of such materials becomes an important challenge. Traditional approaches, including expert-based quality assessment and student-centered evaluation, are resource-consuming, and do not scale efficiently. In this work, we explored the use of pre-existing student learning data as a promising approach to evaluate LLM-generated learning materials. Specifically, we used a dataset where students were completing the program construction challenges by picking the correct answers among human-authored distractors to evaluate the quality of LLM-generated distractors for the same challenges. The dataset included responses from 1,071 students across 22 classes taught from Fall 2017 to Spring 2023. We evaluated five prominent LLMs (OpenAI-o1, GPT-4, GPT-4o, GPT-4o-mini, and Llama-3.1-8b) across three different prompts to see which combinations result in more effective distractors, i.e., those that are plausible (often picked by students), and potentially based on common misconceptions. Our results suggest that GPT-4o was the most effective model, matching close to 50% of the functional distractors originally authored by humans. At the same time, all of the evaluated LLMs generated many novel distractors, i.e., those that did not match the pre-existing human-authored ones. Our preliminary analysis shows that those appear to be promising. Establishing their effectiveness in real-world classroom settings is left for future work.more » « lessFree, publicly-accessible full text available March 3, 2026
-
Many educational recommender systems (EdRecSys) rely on commercial recommendation strategies that emphasize content relevance while neglecting learners’ views on recommendation effectiveness. To address this, we conducted a co-design study with computer science students in an introductory programming course to explore their vision of an ideal EdRecSys. The subjects shared preferences and concerns related to three areas: recommendation approaches, transparency, and control. We used Zimmerman’s model of self-regulated learning to contextualize their expectations within a broader educational framework. Findings offer actionable insights for designing learner-centered AIED systems that foster engagement, agency, and self-regulation.more » « lessFree, publicly-accessible full text available January 1, 2026
-
Worked examples have consistently demonstrated their value in education, serving as the model solutions for solving specific problem types. Past studies indicate that combining worked examples with practice problems is more effective than providing either problems or examples in isolation. Despite these findings, the exploration of the effects of grouping worked examples and problems for programming practice is limited, especially in learning environments designed for practice. This paper compares two content organization approaches in a practice system. The first one is explicitly connecting worked examples and completion problems, allowing students to access them in smaller bundles. The other one is delivering the same set of activities separately but keeping an implicit connection by grouping them under a topic. We examined the effects of these two approaches on student engagement and performance in a semester-long classroom experiment conducted in a CS1 programming course. The results indicate that explicitly connecting worked examples and completion problems increased engagement with the completion problems and supported problem-solving performance by leading to higher success rates and persistence.more » « less
-
Worked code examples are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations of a large number of examples typically used in a programming class. This paper explores the opportunity to facilitate the development of worked examples for Java programming through a human-AI collaborative authoring approach. The idea of collaborative authoring is to generate a starting version of code explanations using LLM and present it to the instructor to edit if necessary. The critical step towards implementing this idea is to ensure that LLM can produce code explanations that look meaningful and acceptable to instructors and students. To achieve this goal, we performed an extensive prompt engineering study and evaluated the explanation produced by the selected prompt in a user study with students and authors.more » « less
-
Worked examples are among the most popular types of learning content in programming classes. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generate a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach.more » « less
-
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generates a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach.more » « less
-
We present the results of a study where we provided students with textual explanations for learning content recommendations along with adaptive navigational support, in the context of a personalized system for practicing Java programming. We evaluated how varying the modality of access (no access vs. on-mouseover vs. on-click) can influence how students interact with the learning platform and work with both recommended and non-recommended content. We found that the persistence of students when solving recommended coding problems is correlated with their learning gain and that specific student-engagement metrics can be supported by the design of adequate navigational support and access to recommendations' explanations.more » « less
-
Educational data mining research has demonstrated that the large volume of learning data collected by modern e-learning systems could be used to recognize student behavior patterns and group students into cohorts with similar behavior. However, few attempts have been done to connect and compare behavioral patterns with known dimensions of individual differences. To what extent learner behavior is defined by known individual differences? Which of them could be a better predictor of learner engagement and performance? Could we use behavior patterns to build a data-driven model of individual differences that could be more useful for predicting critical outcomes of the learning process than traditional models? Our paper attempts to answer these questions using a large volume of learner data collected in an online practice system. We apply a sequential pattern mining approach to build individual models of learner practice behavior and reveal latent student subgroups that exhibit considerably different practice behavior. Using these models we explored the connections between learner behavior and both, the incoming and outgoing parameters of the learning process. Among incoming parameters we examined traditionally collected individual differences such as self-esteem, gender, and knowledge monitoring skills. We also attempted to bridge the gap between cluster-based behavior pattern models and traditional scale-based models of individual differences by quantifying learner behavior on a latent data-driven scale. Our research shows that this data-driven model of individual differences performs significantly better than traditional models of individual differences in predicting important parameters of the learning process, such as performance and engagement.more » « less
An official website of the United States government

Full Text Available